Multiversion Concurrency Control (MVCC)
Introduction
Multiversion concurrency control (MVCC) is method that allows a user to have a concurrent and persistent view of distributed transactions across partitions. GigaSpaces keep multiple versions of modified entries to ensure that a user has a persistent view of the data that is consistent with the SoR.
Processing a large number of simultaneous transactions in Smart DIH requires an extreme write throughput that cannot be paused. In order to maintain transactions in the platform, Space
objects must not be locked. The MVCC mechanism provides an efficient solution, allowing massive updates while maintaining consistency in the Space with the systems of record (SoR). In this manner, the ACID
properties of transactions are maintained, ensuring the consistency and integrity of the data before and after each update, even in highly available distributed systems.
For more information about the MVCC mechanism and how it is used in Smart DIH, read our blog on How to Achieve ACID Compliance on Distributed, Highly Available Systems (search for MVCC).
MVCC Flow
The diagram below shows the process of how an update that is coming from the SoR to a CDC stream travels through the DI
Layer and is finally updated in the Space.
-
In the Space, only the area written in pink would be visible to the user.
-
All the newer updates (in blue) would be occurring on top of the data and is not visible to the user.
-
When the update is applied (by the DI) then that data will be visible to the end user. And so the MVCC update cycle continues.
MVCC Configuration Properties
Name | Type | Default Value | Description |
---|---|---|---|
space-config.mvcc.enabled | Boolean | false | MVCC is enabled for the Space |
space-config.mvcc.space-config.mvcc.historical_entry_lifetime | Integer | 5 | Time limit for holding entry version in the cache. Main measure for “should particular entry version be cleaned or not“ |
space-config.mvcc.historical_entry_lifetime_timeunit | TimeUnit | m | Measure of time limit (millis(ms), seconds(s), minutes(m)…) |
space-config.mvcc.historical_entries_limit | Integer | 5 | Max allowed limit for historical entries number per UID. CANNOT BE 0. Data lifetime take precedence over this criteria. (if number in cache < limit, but some entries are too old - purge them) |
space-config.mvcc.fixed_cleanup_delay_millis | Integer | 1000000 | Timeout between cleanup iterations. To enable dynamic delay based on previous cleanups set to 0. |
The configuration settings for MVCC can be modified to tweak the impact on memory consumption.
Configuring a Space for MVCC
MVCC cannot be configured for a Space that is already Active. To enable MVCC a new Space has to be created.
To enable a Space for MVCC, perform the following steps:
-
Add a new Space by following steps as outlined in the User Guide: SpaceDeck - Spaces - Adding a Space
-
In the Adding a New Space Parameters section, to enable MVCC add the following Context Properties/Property Name: space-config.mvcc.enabled=true
-
To change any of the other default parameters, additional Properties Names should to be added.
-
Once completed, click Create Space.
Querying an MVCC Enabled Space
-
The MVCC enabled Space can be queried using the JDBCv3 compliant RESTful
Services or using the Postgres SQL compliant data-gateway. It is recommended for a persistent view, that queries should be part of an explicit transaction. This is because the consistently of the data for non-transactional queries cannot be guaranteed.
-
For developers, The MVCC enabled Space can also be queried using the Java proxy APIs - limited to basic APIs such as single operations by ID (write, read, take, update), read-multiple operations, read/take with template matching. Developers can also utilize the SQL JDBCv3 driver or SQLQuery Java API.
-
MVCC Space operations are required to be transactional. This is to ensure that committed data is consistently reflected when fetched. A read operation without transaction is only allowed when specifying the
READ_COMMITTED
isolation level modifier. Transactional reads can be performed with isolation levels such as:DIRTY_READ
,REPEATABLE_READ
, or the defaultREAD_COMMITTED
modifier for MVCC.
Limitations - Partial Support
-
When using an MVCC Space, there is no support for the following Space operators:
-
readIfExists, readIfExistsById, asyncRead
-
takeMultiple, takeByIds, takeIfExists, takeIfExistsById, asyncTake
-
writeMultiple
-
change, asyncChange
-
count, clear, aggregate, execute, iterator, dropClass, executorBuilder, asyncLoad
-
-
MVCC is limited to ALL IN CACHE configuration.
-
There is no support for other cache policies such as LRU
, Tiered-Storage and cache topologies such as: Local Cache/View.
-
Secondary unique index will not be allowed.
-
More than one data pipeline per table is not supported. Each object type in the Space must be populated through one specific pipeline.
Performance Impact
-
The number of transactions (throughput) is decreased by 5-7%.
-
MVCC adds on average a 25% RAM overhead.